CUDA 程式設計指南：從延遲優化到吞吐量導向的轉變

運算技術已經歷一項根本性的轉變，從 延遲優化 中央處理器（CPU）設計轉向 吞吐量導向 圖形處理單元（GPU）架構。雖然中央處理器（CPU）如同高速送貨機車（單件物品快速處理），但圖形處理單元（GPU）則像一艘巨型貨輪：每件物品移動較慢，卻能一次運載五萬個貨櫃。

中央處理器（CPU）是透過先進的分支預測技術，專注於最小化單一指令序列的「完成時間」。相反地， 圖形處理單元（GPU） 則設計為透過並行執行數千個線程來最大化「每秒工作量」，以犧牲單線程速度換取巨大的總體吞吐量。

在相似的價格與功耗範圍內，圖形處理單元（GPU）提供的指令吞吐量和記憶體頻寬遠高於中央處理器（CPU）。GPU 專門用於高度平行的運算，並將更多晶體管配置給 資料處理單元（算術邏輯單元），而中央處理器（CPU）則將更多晶體管用於資料快取與流程控制。

統一計算裝置架構（CUDA） 由輝達（NVIDIA）於 2006 年推出。它是一種平行運算平台與程式設計模型，可藉由獨立於圖形應用程式介面（API）的方式，充分發揮圖形處理單元（GPU）的運算能力，實現性能的大幅提升。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

Which component consumes the majority of silicon real estate in a traditional CPU?

Arithmetic Logic Units (ALUs)

Control logic and Data Caching

Floating Point Units

Memory Controllers

QUESTION 2

What was the original purpose of the GPU before CUDA?

General purpose scientific computing

Operating system kernel management

Fixed-function hardware for 3D rendering

High-frequency trading

QUESTION 3

In the cargo ship analogy, what represents the 'Throughput'?

The speed at which the ship moves across the ocean.

The total volume of containers delivered at once.

The size of the ship's engine.

The fuel efficiency per container.

QUESTION 4

What is the primary trade-off made by GPUs to achieve high aggregate throughput?

Higher power consumption per unit.

Lower single-thread performance.

Reduced memory bandwidth.

Simplified mathematical precision.

QUESTION 5

Which NVIDIA software component is required to run CUDA applications?

DirectX 12

NVIDIA Driver and CUDA Toolkit

OpenGL Wrapper

Windows GDI+